It is well known that Canada is a big player in the renewable energy, but how much of their electricity is being generated from renewables? and how has this changed over time with the addition of new renewable technologies? In this project we are going to analyze data of Canada electricity generation and try to answer those questions and get a clear picture of the energy scene in 2020 in this country.
To do the analysis, we are going to use three datasets:
Canadian Electricity Statistics from 1950 to 2007, which is a compound of monthly energy generation by provinces and the whole country, divided by energy sector. The dataset can be download from here.
Canadian Electricity Generation from 2008 to 2020, this is the continuation of the previous dataset and it also consists in monthly data of provinces and country energy generation. The dataset can be download from here.
BP's Statistical Review from World Energy, world compound of energy statistics run by the multinational company. It has data on different countries in terms of electricity generation by sector. We are going to use it to get some information of Canadian renewables. The dataset can be download from here.
The Canadian Electricity Statistics from 1950 to 2007 and Canadian Electricity Generation from 2008 to 2020 datasets, are licensed under the Open Government Licence – Canada.
Now, let's import the three previously mentioned datasets:
import pandas as pd
import numpy as np
import plotly.express as px
import json
import plotly
plotly.offline.init_notebook_mode()
pd.set_option("display.max_columns", 30) # to have a better display of the dataframes
energy50_07 = pd.read_csv(r'data/25100001.csv', parse_dates =["REF_DATE"], index_col ="REF_DATE")
energy50_07.head()
energy08_20 = pd.read_csv('data/25100015.csv', parse_dates =["REF_DATE"], index_col ="REF_DATE")
energy08_20.head()
This dataset was extracted from the original xls file just to have the wind and biomass data.
wind_biomass = pd.read_csv('data/wind_biomass.csv',index_col ="REF_DATE")
wind_biomass.head()
As we want to analyze the energy scene in Canada (this as a country and by provinces) by 2020 and how has changed through time, we will generate three sub-dataframes from the three we just imported:
First, we are going to isolate just the total electricity generation and remove the unnecessary columns, leaving just the province, the type of electricity and the value of generation.
provinces2020 = energy08_20[energy08_20['Class of electricity producer'] == 'Total all classes of electricity producer']
provinces2020 = provinces2020[['GEO','Type of electricity generation','VALUE']]
provinces2020.head()
Then, we are going to reshape the dataframe putting the types of electricity generation as columns headers.
# set_index to generate a multiindex and preserve the province name, unstack to reshape the dataframe
provinces2020 = (provinces2020.set_index(['GEO',provinces2020.index, 'Type of electricity generation'])
.unstack('Type of electricity generation'))
# converting from mega to giga
provinces2020 = provinces2020/1000
provinces2020.head()
With the reshaped dataframe, now we are going to sum the monthly values to obtain a per year dataset and then just leave the 2020 electricity generation.
# groupby to sum the values per year and per province, using grouper to determine this criteria
provinces2020 = provinces2020.groupby([pd.Grouper(level='REF_DATE', freq='Y'),
pd.Grouper(level='GEO')]
).sum()
# xs to create a cross-section just of 2020 electricity generation
provinces2020 = provinces2020.xs('2020-12-31', level=0)
# generation of the second subdataframe by isolating the country values.
canada_2020 = provinces2020.loc['Canada',:]
# removal of country values to have just provinces values.
provinces2020.drop('Canada', inplace=True)
Finally, we are going to give some format and remove some unnecessary columns of the sub-dataframe.
# when the unstacked was done an extra level was generated in the columns header, we are going to remove that level
provinces2020.columns = provinces2020.columns.droplevel(0)
# removal of the unnecessary columns
provinces2020.drop(columns=['Combustion turbine', 'Conventional steam turbine', 'Internal combustion turbine',
'Tidal power turbine', 'Total electricity production from combustible fuels',
'Other types of electricity generation'], axis=1, inplace=True)
# rename of the columns to have a better format
provinces2020.rename(columns={'Hydraulic turbine':'Hydraulic', 'Nuclear steam turbine':'Nuclear',
'Total electricity production from biomass':'Biomass', 'Wind power turbine':'Wind',
'Total electricity production from non-renewable combustible fuels':'Fossil fuels',
'Total all types of electricity generation':'Total generation'}, inplace=True)
provinces2020
With the sub-dataframe of Canadian Electricity generation as a country generated above, we are going to give some format and drop unnecessary values.
canada_2020
# dropping the multiindex in the panda series
canada_2020 = canada_2020.droplevel(level=0, axis=0)
canada_2020.drop(['Combustion turbine', 'Conventional steam turbine', 'Internal combustion turbine',
'Total all types of electricity generation', 'Tidal power turbine',
'Total electricity production from combustible fuels', 'Other types of electricity generation'], inplace=True)
canada_2020.rename({'Hydraulic turbine':'Hydraulic', 'Nuclear steam turbine':'Nuclear',
'Total electricity production from biomass':'Biomass', 'Wind power turbine':'Wind',
'Total electricity production from non-renewable combustible fuels':'Fossil fuels'}, inplace=True)
canada_2020
The analysis of electricity generation through time will be done by country, so the first step will be to isolate the country values and also remove unnecessary columns.
energy50_07 = energy50_07[energy50_07['GEO'] == 'Canada']
energy50_07 = energy50_07[['Electric power, components', 'VALUE']]
Then, we are going to reshape the dataframe as done before and select our columns of interest. After, we will continue with a resampling of the data, monthly to yearly.
# reshaping of the dataset by using pivot
energy50_07 = energy50_07.pivot(columns ='Electric power, components', values='VALUE')
energy50_07 = energy50_07[['Overall total generation',
'Total hydro generation',
'Total conventional steam generation',
'Total steam nuclear generation',
'Total internal combustion generation',
'Total combustion turbine generation']]
# resample, to go from month to years
energy50_07 = energy50_07.resample('A').sum()
energy50_07.tail()
Now, for an easier merge we are going to change the columns names of the dataset to have an agreement with the other datasets. The index is going to be change from the date to just the year. Also, the values of the electricity generation are in Megawatts per hour, we are going to convert them to Gigawatts per hour, this to have a better display avoiding big numbers.
# renaming of the columns for the merge
energy50_07.rename(columns={'Total hydro generation':'Hydraulic turbine',
'Total conventional steam generation':'Conventional steam turbine',
'Total steam nuclear generation':'Nuclear steam turbine',
'Total internal combustion generation':'Internal combustion turbine',
'Total combustion turbine generation':'Combustion turbine'}, inplace=True)
# replacing the date in the index with just the year
energy50_07.set_index(keys=energy50_07.index.year, inplace=True)
# converting
energy50_07 = energy50_07/1000
energy50_07.tail()
We are going to repeat the same cleaning applied to the 1950 to 2007 dataset.
# mask to filter the values that are total generation and from the country
energy08_19 = energy08_20[(energy08_20['Class of electricity producer'] == 'Total all classes of electricity producer')
& (energy08_20['GEO'] == 'Canada')]
# isolating important columns
energy08_19 = energy08_19[['Type of electricity generation','VALUE']]
# reshaping of the dataset
energy08_19 = energy08_19.pivot(columns ='Type of electricity generation', values='VALUE')
# resampling from month to year
energy08_19 = energy08_19.resample('A').sum()
# rename of columns for consistency
energy08_19.rename(columns={'Total all types of electricity generation':'Overall total generation',
'Tidal power turbine':'Tidal', 'Wind power turbine':'Wind',
'Total electricity production from non-renewable combustible fuels': 'Total electricity production from fossil fuels'}, inplace=True)
# from date to year
energy08_19.set_index(keys=energy08_19.index.year, inplace=True)
# converting from mega to giga
energy08_19 = energy08_19/1000
# dropping 2020 because the year is not done
energy08_19.drop(2020, inplace=True)
energy08_19.tail()
The merge process will begin with a left join of the 1950-2007 dataset with the Wind column of the wind_biomass dataset.
# convertion from terawatt to gigawatt
wind_biomass = wind_biomass*1000
# left join
energy50_19 = energy50_07.merge(right=wind_biomass.WIND, how='left', on='REF_DATE')
energy50_19.rename(columns={'WIND':'Wind'}, inplace=True)
With these merged values and because the columns are homogenized between the datasets, we are going to perform a simple concatenation.
energy50_19 = pd.concat([energy50_19, energy08_19])
The last merge will be with the biomass column, that is not present in any of the datasets.
energy50_19= energy50_19.merge(right=wind_biomass.BIOMASS, how='left', on='REF_DATE')
energy50_19
The merged dataset is finished but we still need to do some cleaning to it, like generating a combined value of electricity generated by fossil fuels, renaming columns and retain just the important columns.
Also, we are going to fill the null value with 0, because when it is a null value it means that there wasn't any generation.
energy50_19.fillna(0, inplace=True)
# function to sum the different methods that generates electricity by using fossil fuels
def fossil_fuels(df):
if df['Total electricity production from fossil fuels'] != 0:
return df['Total electricity production from fossil fuels']
else:
if df['Total electricity production from combustible fuels'] == 0:
return df['Conventional steam turbine'] + df['Internal combustion turbine'] + df['Combustion turbine']
if df['Total electricity production from combustible fuels'] != 0:
return df['Total electricity production from combustible fuels']
# application of the function
energy50_19['Total electricity production from fossil fuels'] = energy50_19.apply(fossil_fuels, axis=1)
energy50_19.rename(columns={'Hydraulic turbine':'Hydraulic', 'Nuclear steam turbine':'Nuclear',
'BIOMASS':'Biomass', 'Wind power turbine':'Wind',
'Total electricity production from fossil fuels':'Fossil fuels'}, inplace=True)
energy50_19.drop(['Other types of electricity generation', 'Total electricity production from biomass',
'Total electricity production from combustible fuels'], axis=1, inplace=True)
# combination of renewables without hydraulic
energy50_19['Renewables w/o Hydro'] = energy50_19[['Wind','Solar','Tidal', 'Biomass']].sum(axis=1)
energy50_19.tail()
The data analysis will be focused on three aspects, electricity generation as a country in 2020, electricity generation over time from 1950 to 2019 and electricity generation per province in 2020.
With these aspects, we can have a clear picture of the energy sector in Canada and the role of the renewables in it.
To analyze the Canadian electricity generation as a country, we are going to generate a donut chart with the proportion of electricity being generated by different sources.
fig = px.pie(values=canada_2020, names=canada_2020.index, color=canada_2020.index,
title='Electricity generation in Canada by 2020', width=700, height=550,
color_discrete_map = {'Fossil fuels':'#FF595E', 'Hydraulic':'#1982C4', 'Nuclear':'#FFDE85', 'Wind':'#6C991E',
'Biomass':'#6A4C93', 'Solar':'#FA9F42'}, hole=.4, custom_data=[canada_2020.index])
fig.update_traces(textinfo='percent+label')
fig.update_traces(
hovertemplate="<b>Generation: %{value} GW/H<extra></extra>")
fig.update_layout(
title={
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'})
fig.update_layout(margin={'r':30, "b":30, 'l':0})
plotly.offline.iplot(fig, filename='electricity_proportion')
The main source of electricity generation in Canada by 2020 is the Hydraulic sector, accounting for 60% of all electricity being generated in Canada. The following source is the Fossil fuels with up to 17% and the third main generation source is Nuclear.
The other renewables have less impact in Canada's electricity generation, as in combination they make up to 7% of all generation, with Wind as the main renewable besides Hydro.
Taking all renewables together, they account for 68% of all electricity generation in Canada, making it a country with more than half of its generation renewable.
For the analysis of how the production of energy has changed from 1950 to 2019, we are going to generate an area chart of the electricity generation by source. The renewables that are not hydraulic will be combined in the plot for a better display.
fig = px.area(energy50_19, x=energy50_19.index,
y=['Hydraulic', 'Fossil fuels','Nuclear', 'Renewables w/o Hydro'],
color_discrete_map={'Fossil fuels':'#FF595E', 'Hydraulic':'#1982C4', 'Nuclear':'#FFDE85', 'Renewables w/o Hydro':'#6C991E'},
title='Electricity generation by sector from 1950 to 2019', labels={'variable':'Energy sector'})
fig.update_traces(
hovertemplate='<b>%{fullData.name}:%{y}<extra></extra>')
fig.update_layout(xaxis={'title':'Year'},
yaxis={'title':'Elecricity generation, GW/H'},
title={
'y':0.85,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
plot_bgcolor="#F9F9F9",
hovermode='x')
plotly.offline.iplot(fig, filename='electricity_time')
As it can be seen in the graph above, the hydraulic source has always been the main source of electricity generation in Canada, and it has just grown bigger from 1950 to 2019, picking in 2017 with a generation of 390,000 Gigawatts/hour.
The generation of electricity from fossil fuels started in 1977, and since that time it has remained as the second source of electricity in Canada. The pick of electricity being generated by fossil fuels was in the first half of the 2000s decade and it repicked in the last three years.
The nuclear electricity generation also started in 1977, and it grow at a slow rate until 1994, with a pick of 101,000 Gigawatts/hour, after that it has had downs and rises but overall has been constant in its generation.
The renewable electricity generation without Hydraulic has been pretty low (less than 15,000 GW/H) prior to 2010. But it increased dramatically in 2016, more than double. Let's dig more in what was the reason of this big change by plotting only the electricity generation of the renewables over time.
fig = px.area(energy50_19.loc[1973:2019,:], x=energy50_19.loc[1973:2019,:].index, y=['Biomass','Wind','Solar','Tidal'],
color_discrete_map= {'Wind':'#6C991E', 'Biomass':'#6A4C93', 'Solar':'#FA9F42', 'Tidal':'#90e0ef'},
title='Renewable electricity generation by sector from 1970 to 2019', labels={'variable':'Renewables'})
fig.update_traces(
hovertemplate='<b>%{fullData.name}:%{y}<extra></extra>')
fig.update_layout(xaxis={'title':'Year'},
yaxis={'title':'Elecricity generation, GW/H'},
title={
'y':0.85,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
plot_bgcolor="#F9F9F9",
hovermode='x')
plotly.offline.iplot(fig, filename='electricity_renew_time')
The big increase of electricity produced by renewables is due Wind turbines, as we can see above, this sector tripled its generation in just one year (2016). With that increased, Wind became the second most important renewable in Canada, just behind Hydraulic.
Biomass is the third most important renewable, and it started their generation in 1974. The Biomass sector grew constantly until 2002, when after that has made a plateau around 10,000 Gigawatts/Hour.
Solar in the other hand, started in 2011 to produce significant amounts of electricity but it was until 2016, as with Wind, when the sector increased their generation up to 2,000 Gigawatts/Hour.
The other renewable source of electricity in this study is Tidal, this source of energy never has generated big amounts of electricity, with its top of generation at 27 Gigawatts/Hour. In 2019, the only Tidal energy plant in Canada was shut down, leading to no more electricity coming from this kind of source.
For the province electricity generation analysis, we are going to focus on three factors: Total generation per province, most important source of energy per province and the most important renewable besides Hydraulic per province.
To do this, we are going to create three choropleth maps, one for each factor. The choropleth map needs a json file with the shape and coordinates of the provinces. The json file can be found in the Github repository.
But before we create the maps, we need to determine which source of energy it is the main for each province and also the main renewable.
# function to classify the provinces by the most important energy source
def principal_energy(df):
df = df[['Fossil fuels', 'Hydraulic', 'Nuclear', 'Wind', 'Biomass', 'Solar']]
if df.max() == df['Hydraulic']:
return 'Hydraulic'
elif df.max() == df['Nuclear']:
return 'Nuclear'
elif df.max() == df['Solar']:
return 'Solar'
elif df.max() == df['Fossil fuels']:
return 'Fossil fuels'
elif df.max() == df['Wind']:
return 'Wind'
elif df.max() == df['Biomass']:
return 'Biomass'
# function to classify the provinces by the most important renewable
def principal_renewables(df):
if df[['Solar', 'Wind', 'Biomass']].max() == df['Wind']:
return 'Wind'
if df[['Solar', 'Wind', 'Biomass']].max() == df['Solar']:
return 'Solar'
if df[['Solar', 'Wind', 'Biomass']].max() == df['Biomass']:
return 'Biomass'
provinces2020['Principal energy'] = provinces2020.apply(principal_energy, axis=1)
provinces2020['Principal renewables'] = provinces2020.apply(principal_renewables, axis=1)
provinces2020
# import and store of the json file with the shapes.
json_file = open(r'data/canada.json')
provinces = json.load(json_file)
#change of Yukon to match the json file
provinces2020.rename(index={'Yukon':'Yukon Territory'}, inplace=True)
#palette = plotly.colors.make_colorscale(["#b7e4c7","#95d5b2","#74c69d","#52b788","#40916c","#2d6a4f","#1b4332"])
fig = px.choropleth_mapbox(provinces2020, geojson=provinces, color="Total generation",
locations=provinces2020.index, featureidkey="properties.name",
center={"lat": 59, "lon": -102},
mapbox_style="carto-positron", zoom=2,
color_continuous_scale='YlOrRd',
opacity=.7,custom_data=[provinces2020.index, 'Principal energy','Total generation'],
title='Electricity generation per province, GigaWatts/Hour')
fig.update_layout(margin={"b":30,'l':30,'t':50})
fig.update_traces(
hovertemplate='<b>%{customdata[0]}</b>' + "<br>Generation: %{customdata[2]:,} GW/H" + "<br>Principal energy: %{customdata[1]}<extra></extra>")
fig.update_layout(title={
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'})
fig.update_traces(marker_line_color='#D3D3D3')
plotly.offline.iplot(fig, filename='map_province')
The province with the biggest generation of electricity is Quebec, with around 122,000 Gigawatts/hour. It is followed by Ontario with 93,000 Gigawatts/hour. These two provinces make for 50% of all electricity generated in Canada so far in 2020, making East-Canada, the most important region in terms of electricity generation.
The second region of importance in terms of electricity generation is Western Canada, with Alberta and British Columbia generating around 20% of the country electricity.
The north and center provinces are the ones that produce less electricity for the country.
fig = px.choropleth_mapbox(provinces2020, geojson=provinces, color="Principal energy",
locations=provinces2020.index, featureidkey="properties.name",
center={"lat": 59, "lon": -102},
mapbox_style="carto-positron", zoom=2,
color_discrete_map = {'Fossil fuels':'#FF595E', 'Hydraulic':'#1982C4', 'Nuclear':'#FFDE85', 'Wind':'#6C991E'},
opacity=.7,custom_data=[provinces2020.index, 'Principal energy', provinces2020.drop('Total generation',axis=1).max(axis=1)],
title='Principal source of electricity per province')
fig.update_layout(margin={"b":30,'l':30,'t':50})
fig.update_traces(
hovertemplate='<b>%{customdata[0]}</b>' + "<br>Principal energy: %{customdata[1]}" + "<br>Generation: %{customdata[2]:,} GW/H<extra></extra>")
fig.update_layout(title={
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'})
fig.update_traces(marker_line_color='#D3D3D3')
plotly.offline.iplot(fig, filename='map_province_type')
As seen in the map above, the electricity generation in Canada is really diverse in a per province basis, with Hydraulic and Fossil fuels being the most important source of electricity in 5 provinces each. And, this makes totally sense as Hydraulic is the biggest source of energy in the Country and Fossil fuels is the second.
The other two most important sources are Nuclear, with two provinces, and Wind, being the most important source of electricity for Prince Edward Island province.
fig = px.choropleth_mapbox(provinces2020, geojson=provinces, color="Principal renewables",
locations=provinces2020.index, featureidkey="properties.name",
center={"lat": 59, "lon": -102},
mapbox_style="carto-positron", zoom=2,
color_discrete_map = {'Wind':'#6C991E', 'Biomass':'#6A4C93' },
opacity=.7,custom_data=[provinces2020.index, 'Principal renewables', provinces2020.drop('Total generation',axis=1).max(axis=1)],
title='Principal renewables source of electricity per province')
fig.update_layout(margin={"b":30,'l':30,'t':50})
fig.update_traces(
hovertemplate='<b>%{customdata[0]}</b>' + "<br>Principal energy: %{customdata[1]}" + "<br>Generation: %{customdata[2]:,} GW/H<extra></extra>")
fig.update_layout(title={
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'})
fig.update_traces(marker_line_color='#D3D3D3')
plotly.offline.iplot(fig, filename='map_province_type_renew')
If we exclude Hydraulic generation, we end up with a map with not so much diversity in terms of renewable electricity sources. Almost in every province the most important renewable source is Wind, and this is in agreement with the fact that Wind is the second most important renewable in Canada.
The only province where Wind is not the main renewable source is in British Columbia, where the main renewable source of electricity in 2020 is Biomass.
It is a fact that Canada is a "green" country in terms of electricity generation, as around 68% percent of their generation comes from renewable sources, mainly Hydraulic and Wind. And this has been the standard for the country since the 50s, as the generation of electricity from Hydraulic sources has been growing from that year to today.
The Wind sector, it has exploited on recent years and with new technology and investment, it will keep growing and becoming a more important source for the country. And, this apply also to solar energy, where there is a big room of develop that hasn't been really exploited.
As the threat of global warming and climate change is right in front of us, it is important to keep investing in renewables and decrease the use of fossil fuels as a source of energy, like in some of the provinces of the country.